With many potential predictors we can eliminate some by examining correlation plots
There is weak correlation between SOC_stock_spline and
the selected covariates. Additionally there is some collinearity between
predictors. The spectral predictors such as NDYI (Normalized Difference
Yellow Index) and NDVI (Normalized Difference Vegetation Index). We can
remove these based on the correlation coefficient > 0.7.
Now we can begin examining the distribution of carbon stock values in the dataset to choose the appropriate transformation. We also scale and center the numeric predictor variables.
Now build models using log transformed carbon stock data. We need to
specify that sample_ID is a random effect because of the
multiple samples at one location. lower_depth will be a
random slope to adjust model based on how it is affected by depth.
Also lower depth is included as a fixed effect to specify an overall
effect of depth in addition to accounting for the random slope within
sample_ID locations. See:
We also use a different optimizer for the lmer model which uses
nloptwrap by default and gives convergence warnings for
multiple model iterations. We use the bobyqa optimizer and
have assessed negligiable changes to log liklihood (< 0.1) and
parameter estimates (< 0.00001). See: - https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html
help(convergence)We could use a Generalized Linear Model or a Generalized Linear Mixed Model here too but they often fail to converge. We will model with the log-transformed soil carbon stocks in the linear mixed model
Pairwise comparisons between the top, largest model and the rest show if models with fewer parameters are significantly different. Considering AIC will help determine the most parsimonious model with the best fit to the data.
From the ANOVAs it looks like model 7 has the lower AIC
when compared together.
Model 7 is the lowest AIC
Next test if there is an interaction
The model \(R^2\) for
mod7 is 0.832
Anova between sites
From Yu et al,.
Correlations among predictors for wetlands, uplands, mesic
Partial correlation measures the strength of relationship between two variables (e.g., SOC concentration and MAT) while removing the effects of one or more possibly confounding variables (e.g., geochemical variables) on this relationship. Differences between zero-order (i.e., no other variables have been controlled) and partial correlations indicate the degree of dependency of target predictors (e.g., MAT) on controlled predictors (e.g., geochemical variables).
So the interaction between two predictor variables.
GAMMs because of under prediction
Random Forest